To address the challenge of dynamic changes in object relationships over time in videos, a video dynamic scene graph generation model based on multi-scale spatial-temporal Transformer was proposed. The multi-scale modeling idea was introduced into the classic Transformer architecture to precisely model dynamic fine-grained semantics in videos. First, in the spatial dimension, the attention was given to both the global spatial correlations of objects, similar to traditional models, and the local spatial correlations among objects’ relative positions, which facilitated a better understanding of interactive dynamics between people and objects, leading to more accurate semantic analysis results. Then, in the temporal dimension, not only the traditional short-term temporal correlations of objects in videos were modeled, but also the long-term temporal correlations of the same object pairs throughout the entire videos were emphasized. Comprehensive modeling of long-term relationships between objects assisted in generating more accurate and coherent scene graphs, mitigating issues arising from occlusions, overlaps, etc. during scene graph generation. Finally, through the collaborative efforts of the spatial encoder and temporal encoder, dynamic fine-grained semantics in videos were captured more accurately by the model, avoiding limitations inherent in traditional single-scale approaches. The experimental results show that, compared to the baseline model STTran, the proposed model achieves an increase of 5.0 percentage points, 2.8 percentage points, and 2.9 percentage points in terms of Recall@10 for the tasks of predicate classification, scene graph classification, and scene graph detection, respectively, on the Action Genome benchmark dataset. This demonstrates that the multi-scale modeling concept can enhance precision and effectively boost performance in dynamic video scene graph generation tasks.
The traditional Point-wise Mutual Information (PMI) method has shortcoming of overvaluing the co-occurrence of two low-frequency words. To get the proper value of k of improved PMI named PMIk to overcome the shortcoming of PMI, and solve the problem that the term extraction cannot be obtained from a segmented corpus with segmentation errors, as well as maintaining the portability of term extraction system, combining with the PMIk method and two fundamental rules, a new method was put forward to identity terms from an unsegmented corpus. Firstly, 2-gram extended seed was determined by computing the bonding strength of two adjoining words by PMIk method. Secondly, whether the 2-gram extended seed could be extended to 3-gram was determined by respectively computing the bonding strength between the seed and the word in front of it and the word located behind it, and then getting multi-gram term candidates iteratively. Finally, the garbage of term candidates were filtered using the two fundamental rules to obtain terms. The theoretical analysis shows that PMIkcan overcome the shortcoming of PMI when k≥3(k∈N+). The experiments on 1 GB SINA finance Blog corpus and 300 MB Baidu Tieba corpus verify the theoretical analysis, and PMIk outperforms PMI with good portability.
A smart and green home is a dynamic large-scale system with high complexity and a huge amount of information. In order to further improve coordination between subsystems and make the best of multi-source information for the smart home, a multi-Agent intelligent home system based on multi-source information fusion was designed. The framework and interaction mechanisms of Agent were introduced and a multi-source information fusion model based on Adaptive Neural-network-based Fuzzy Interference System (ANFIS) was put forward to conduct the feature extraction and learn occupant's personal behavior. A simulation platform using lightweight embedded Jade Agent on Android and Matlab on personal computer was developed to control the natural lighting system in smart home. The theoretical analysis and the simulation results show that the model can improve synergistic interaction of home systems, and finally enhance the efficiency of multi-source information fusion in decision making process.